Conservative stemming for search and indexing
نویسندگان
چکیده
In this paper, we describe a stemmer which is designed to stem conservatively to orthographically correct word forms and recognizing words which do not need to be stemmed, such as proper nouns. We compare the performance of our stemmer with several other stemmers and propose further work to make this stemmer more effective for information retrieval, topic detection, and other linguistic applications.
منابع مشابه
Multl-Language Text Indexing for Internet Retrieval
We address here the issues associated with indexing multilingual collections of information, as is found for example on the internet. We examine in particular the task of language identiication and the use of stemming algorithms for several European languages. We also present the lessons we have learned from our experience in using the SPIDER information retrieval system as a search engine over...
متن کاملA Comparing between the impacts of text based indexing and folksonomy on ranking of images search via Google search engine
Background and Aim: The purpose of this study was to compare the impact of text based indexing and folksonomy in image retrieval via Google search engine. Methods: This study used experimental method. The sample is 30 images extracted from the book “Gray anatomy”. The research was carried out in 4 stages; in the first stage, images were uploaded to an “Instagram” account so the images are tagge...
متن کاملInformation Retrieval Effectiveness of Turkish Search Engines
This is an investigation of information retrieval performance of Turkish search engines with respect to precision, normalized recall, coverage and novelty ratios. We defined seventeen query topics for Arabul, Arama, Netbul and Superonline. These queries were carefully selected to assess the capability of a search engine for handling broad or narrow topic subjects, exclusion of particular inform...
متن کاملOn Effective Conceptual Indexing and Similarity Search in Text Data
Similarity search in text has proven to be an interesting problem from the qualitative perspective because of inherent redundancies and ambiguities in textual descriptions. The methods used in search engines in order to retrieve documents most similar to user-defined sets of keywords are not applicable to targets which are medium to large size documents, because of even greater noise effects st...
متن کاملContext based Indexing in Information Retrieval System using BST
Searching of data relevant to our query is done by information retrieval system. Keyword searching is the basic idea of this system which tries to solve the large search space problem as the documents to be searched could be of any length. This means time to search will increase with length of document. Search time will be reduced by reducing the search space. In this, we are constructing a met...
متن کامل